Statistical Disclosure Control

Software testing (WP 6)

Leading partner: IStat

Participating partners: IStat, CBS, ONS, StBa, TUIlm, IdesCat, INE

Objectives

The general aim of this workpackage is to test the capabilities and to establish correct implementation of the Argus software. Any software application has to be tested before it is released, in order to check its reliability, user-friendliness, performance and portability.
In the course of the project, the functionality of Argus will be improved by incorporating several of the techniques developed by different partners. The resulting software implementation will consist of new program modules to be integrated into Argus. We distinguish two kind of software testing. The first is the testing of each module in order to check reliability and performance of the routines to be introduced into Argus. This will be done by the partner responsible for the development of the technique in the co-operation with the software developers (namely contractor 1, who will take care of the process of software integration). The second form of testing aims at checking the reliability and capabilities of ARGUS from a user's perspective. This will be performed in two rounds during the project and will involve a group of test experts as described in Task TS1.


The workpackage is outlined in the following tasks:

Task TS1 (Responsible IStat)

Objectives

To co-ordinate the testing of the µ-ARGUS and τ-ARGUS software among partners and to test the software.

Description of the work

Creation of a questionnaire that will guide testers through the process of checking the software. Apart from the obvious checking for bugs and other implementation difficulties, special attention will be paid both to possible applications and to potential problems associated with dissemination in practice. µ-ARGUS. In order to test the capabilities of the software for the new individual risk in complex microdata module, µ-ARGUS will be tested both on hierarchical data and on panel data: for the first case we intend to use the Labour Force Survey whereas for the second we will use the Italian data from the European Community Household Panel. τ-ARGUS. A few surveys (for example the Structural Business Statistics survey, common to all National Statistical Institutes as well as Eurostat) will be carefully examined and the structure of the tabulation plan will be reproduced by applying τ-ARGUS. The whole process of disclosure control - from the input of the data into ARGUS to the output of the disclosure controlled data and the production of the report - needs to be rigorously tested. Ways to improve the flexibility of the software will be thoroughly investigated and the integration of τ-ARGUS into the tabular data production process should be evaluated. Moreover different types of input (microdata and tabular data in a variety of formats (ASCII, SAS and Oracle) will be checked.

Milestones and expected result

Improvements in managing input and output and in user friendliness. Rigorous checks of the capabilities of the software.


Task TS2: Testing of τ-ARGUS

Objectives

TS2.1 (responsible StBA) First user experience. Testing software reliability and capability of τ-ARGUS.
TS2.2 (responsible StBA) Testing software reliability concerning correct implementation of the GHQUAR-algorithm (c.f. WP3 and WP4.2).
TS2.3 (responsible StBA) Testing software reliability concerning the cell-suppression algorithms developed in WP 4.1.
TS2.4 (responsible IDesCat) New procedures based on cell suppression by network flows techniques for tabular data in τ-ARGUS will be tested, particularly those involving large tables and dealing with business and/or social data.
TS2.5 (responsible ONS) Testing software reliability and applicability to UK Official Business Statistics data.

Description of work

Task TS2.1 Gain first user experience. Test software reliability and capability of τ-ARGUS along the items of the questionnaire created as part of task TS1.
Task TS2.2 Check correct implementation of the GHQUAR-algorithm (c.f. WP3 and WP4.2).
Task TS2.3 Check software reliability concerning the cell-suppression algorithms developed in WP 4.1
Task TS2.4 Testing the performance of new tools based on network flows techniques applied to automated secondary cell suppression by using large data sets in tabular format. As the table dimension becomes an outstanding attribute, the aim is to especially consider the computational requirements and the analysis facilities of the outputs offered by these developments. The performance of procedures concerning business and/or social (personal) data will be also analysed. All tests results will be reported separately according to data nature and the analysis performed and will be carried out by means of real data.
Task TS2.5 Testing software reliability and usefulness on practical official business statistics tables: application to Interdepartmental Business Register and/or Annual Business Inquiry.

Milestones and expected result

Task TS2.1 Rigorous check of τ-ARGUS.
Task TS2.2 Correct implementation of GHQUAR.
Task TS2.3 Assess correct implementation of cell-suppression algorithms developed in WP 4.1.
Task TS2.4 Enhance future ARGUS-software versions with respect to some results of WP 4.1 methods related tabular data.
Task TS2.5 Practical check and assessment of suitability of τ-ARGUS for Business Statistics Data reports at a National Statistics Institute.

Task TS3: Testing of µ-ARGUS

Objectives

TS3.1 (responsible IDesCat) The aim of this task is to test extended versions of µ-ARGUS with those that will be related to microaggregation and aggregation of qualitative data methods developed in WP1.1 and implemented in WP2. It will be specially considered the balance information loss vs. data quality.
TS3.2 (responsible StBa) The goal of this task is to test µ-ARGUS concerning the masking algorithm. The results will be used as far as possible to improve the implementation.
TS3.3 (responsible ONS) Testing software reliability and applicability to UK Official Social Statistics data: e.g. the Labour Force Survey or the General Household Survey

Description of work

Task TS3.1 The purpose will be to test two new methods of µ-ARGUS, namely microaggregation (for quantitative data) and aggregation of qualitative data, using real data provided by URV. The results of validation will include not only comments on the running time and statistical quality but some practical issues for end users. Whenever possible and sensible, testing will also include a preliminary comparative analysis of methods with relation to information loss and data quality. Preliminary statements when both these modules are applied by using to re-identification model to obtain safe samples of microdata in a real situation of unique populations will be also considered.
Task TS3.2 Test two preliminary versions of µ-ARGUS concerning the correct implementation of the masking algorithm developed as task of WP 1.1.
Task TS3.3 Testing software reliability and usefulness on a practical official social statistics problem: application to the Labour Force Survey or the General Household survey.

Milestones and expected result

Task TS3.1 Enhance future ARGUS-software versions with respect to some results of WP 1.1 methods related to microdata, including rules on safe samples of microdata in to real situation
Task TS3.2 Improvements in user-friendliness of the program, making the program and the algorithms implemented in it a useful and easily usable tool for potential users.
Task TS3.3 Practical check and assessment of suitability of µ-ARGUS for Social Statistics Data returns at a National Statistics Institute.

Task TS4 (responsible INE)

Objectives

Testing and validation are both tasks that are justified by themselves in order to ensure the necessary communication between the advances of developers, goodness of their results and effective applicability of the new methods incorporated to the successive versions of µ-ARGUS and τ-ARGUS. The testing approach will be especially directed to the real usefulness of methods as a standard tool for disclosure control of statistical data at NSIs.
Task TS4.1 The aim of this workpackage is to test some extended versions of µ-ARGUS, specially with those that would be related to microaggregation and aggregation of qualitative data methods as well as record masking if they are implemented in new developments of this software
Task TS4.2 A complementary task will be to analyse its performance in disclosure control of uniqueness in populations.
Task TS4.3 New procedures for tabular data in τ-ARGUS will also be tested, particularly those involving very large tables and/or hierarchical/linked structures and mainly based on cell suppression methods as the multicommodity flows approach.

Description of the work

Task TS4 The whole workpackage consists of testing, that is, evaluating and validating every implemented (new) method for both microdata and tabular data. Especial interest will be devoted to the validation of the options to automatically run tools and methods as well as to the quantification of the computational requirements under different environments (computing time and others parameters) in the practical daily work of the NSI. Whenever possible and sensible, testing will also include a preliminary comparative analysis of methods with relation to information loss and data quality. All tests results will be reported separately according to the method tested and the analysis performed and will be carried out by means of real data.
Task TS4.1 testing of µ-ARGUS: The purpose will be to test two new methods of µ-ARGUS, namely microaggregation (quantitative data) and aggregation of qualitative data using real data. Record masking techniques implemented will also be tested. The results of validation will include not only comments on the running time and statistical quality but some practical issues for end users.
Task TS4.2 performance of µ-ARGUS on uniqueness in populations: The main task will be to extend the detailed statements of (new versions of) µ-ARGUS when it is used (by using a re-identification model) to obtain safe samples of microdata files in a real situation of unique populations, with large files and many re-identification keys. The analysis of the obtained results with µ-ARGUS will be especially centred on the features that some of the new implemented methods will offer, as conditional recoding (as well aggregation of qualitative data, masking records or others), and it will be carried out by means of the protection criteria applied, the amount of information loss and the resulting risk of statistical disclosure.
Task TS4.3 testing of τ-ARGUS: Testing the performance of new tools based on multicommodity flow networks (see workpackage WP 4.1) applied to automated secondary cell suppression by using large data sets in tabular format. As the table dimension becomes an outstanding attribute, the aim is to especially consider the computational requirements and the analysis facilities of the outputs offered by these developments. The performance of procedures when there is some hierarchical and/or linked structure will be analysed. A brief comparative analysis with other approaches based on cell suppression already available in τ-ARGUS will also be output.

Milestones and expected result

Task TS4 The test reports will help to enhance the ARGUS-software.